Methods, systems, and computer program products for creating three-dimensional video sequences

ABSTRACT

Methods and systems for creating three-dimensional video sequences of a scene are disclosed. An example method can include receiving multiple frames of a scene. The method may include selecting a target frame from among the multiple frames; selecting a first subset of frames, N, from among the multiple frames that are associated with the target frame that is representative of a large stereo baseline; and analyzing the first frame subset to identify two images for forming a stereoscopic pair of frames. Further, the method includes extracting depth data of static objects in the stereoscopic pair. The method includes selecting a second subset of frames that are associated with the target frame that is representative of a smaller stereo baseline than that represented by N; and utilizing the second frame subset to calculate depth of moving objects. The method includes generating a three-dimensional video frame based on the depth data.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. utility patent applicationSer. No. 13/288,209, filed Nov. 3, 2011, which claims the benefit ofU.S. provisional patent application Ser. No. 61/409,664, filed Nov. 3,2010; the disclosures of which are incorporated herein by reference intheir entireties.

TECHNICAL FIELD

The subject matter disclosed herein relates to generating a videosequence of a scene. In particular, the subject matter disclosed hereinrelates to methods, systems, and computer program products for using atwo-dimensional video sequence of a scene to create a three-dimensionalvideo sequence of the scene.

BACKGROUND

Stereoscopic, or three-dimensional, video is based on the principle ofhuman vision. Video is a sequence of captured images (or frames), eachof which, when combined with camera displacement, can record the sameobject(s) or new objects from slightly different angles. In such case,the captured sequence can then be transferred to a processor that mayassign the captured sequence as the view for one eye (i.e., left orright eye), may analyze the individual frames and possibly interpolateadditional frames/frame views, and may, for each frame generate acorresponding view for the other eye. The two resulting video sequencesmay then be combined to create a three-dimensional video sequence. Theresulting three-dimensional video sequence can further be encoded using,but not limited, to one of the popular video encoding formats such asmotion JPEG, MPEG, H.264, and the like. The video sequence can furtherbe stored with audio to a digital media using a format such as, but notlimited to, .avi, .mpg, and the like.

Many techniques of viewing stereoscopic video have been developed andinclude the use of colored or polarizing filters to separate the twoviews, temporal selection by successive transmission of video using ashutter arrangement, or physical separation of the two views in theviewer and projecting them separately to each eye of a viewer. Inaddition, display devices have recently been developed that arewell-suited for displaying stereoscopic images and videos. For example,such display devices include, but are not limited to, digital stillcameras, personal computers, digital picture frames, set-top boxes,high-definition televisions (HDTVs), and the like.

The use of digital image capture devices, such as, but not limited to,digital still cameras, digital camcorders (or video cameras), and phoneswith built-in cameras, for use in capturing digital images has becomewidespread and popular. Because video sequenced captured using thesedevices are stored in a digital format, such video can be easilydistributed and edited. For example, the videos can be easilydistributed over networks, such as the Internet. In addition, the videoscan be edited by use of suitable software on the image capture device ora personal computer.

Video sequences captured using conventional single lens, single sensorimage capture devices are inherently two-dimensional. While duallens/sensor combinations can be used to create three-dimensionalcontent, it is desirable to provide methods and systems for using theseconventional devices for generating three-dimensional videos.

SUMMARY

Methods, systems, and computer program products for creatingthree-dimensional video sequences of a scene are disclosed herein.Particularly, embodiments of the presently disclosed subject matter caninclude a method that uses a processor and memory for receiving atwo-dimensional video sequence of a scene. The two-dimensional videosequence can include multiple frames. The method may also includeselecting a target frame, T, from among the multiple frames. Further,the method may include selecting a first subset of frames representativeof a large camera displacement, N, from among the multiple frames thatare associated with the target frame T. The method may also includeanalyzing the first subset of frames to identify two images for use informing a stereoscopic pair of frames with a predetermined spatialdifference. Further, the method may include extracting depth data ofstatic objects in the stereoscopic pair of frames. The method may alsoinclude selecting a second subset of frames representative of a smallcamera displacement, n (n<<N), from among the multiple frames that areassociated with T. The method may include utilizing the second subset offrames to calculate depth of moving objects. The method may also includecombining the static and moving objects based on the depth data.Further, the method may include generating a three-dimensional videoframe corresponding to the target frame based on the depth data.

The Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, not is it intended tobe used to limit the scope of the claimed subject matter. Furthermore,the claimed subject matter is not limited to any limitations that solveany or all disadvantages noted in any part of this disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing summary, as well as the following detailed description ofvarious embodiments, is better understood when read in conjunction withthe appended drawings. For the purposes of illustration, there is shownin the drawings exemplary embodiments; however, the present subjectmatter is not limited to the specific methods and instrumentalitiesdisclosed. In the drawings:

FIG. 1 is a front view of a user holding a camera and moving the camerafor creating an initial panning sequence of a scene in accordance withembodiments of the present subject matter;

FIG. 2 is a diagram depicting a top view of an example panning sequencethat may be implemented by use of an image capture device in accordancewith embodiments of the present subject matter;

FIG. 3 is a block diagram of an example image capture device includingan image sensor and a lens for use in capturing a two-dimensional videosequence of a scene according to embodiments of the presently disclosedsubject matter;

FIGS. 4A and 4B depict a flow chart of an example method for creating athree-dimensional video sequence of a scene using the image capturedevice, alone or together with any other suitable device, in accordancewith embodiments of the present disclosure;

FIGS. 5A and 5B depict a flow chart of an example method for creating athree-dimensional video sequence of a scene in accordance withembodiments of the present subject matter;

FIG. 6 is an example method for depth creation according to anembodiment of the present subject matter;

FIG. 7 is a flow chart of an exemplary method for depth creation using amacro-stereo based technique according to an embodiment of the presentsubject matter;

FIG. 8 is a flow chart of an example method for depth creation using amicro-stereo based technique according to an embodiment of the presentsubject matter;

FIG. 9 is a diagram showing depth calculation using a micro-basedtechnique in accordance with an embodiment of the present subjectmatter;

FIG. 10 illustrates diagrams of a micro stereo based technique usingframe-by-frame analysis in accordance with an embodiment of the presentsubject matter; and

FIG. 11 illustrates an exemplary environment for implementing variousaspects of the subject matter disclosed herein.

DETAILED DESCRIPTION

The subject matter of the present subject matter is described withspecificity to meet statutory requirements. However, the descriptionitself is not intended to limit the scope of this patent. Rather, theinventors have contemplated that the claimed subject matter might alsobe embodied in other ways, to include different steps or elementssimilar to the ones described in this document, in conjunction withother present or future technologies. Moreover, although the term “step”may be used herein to connote different aspects of methods employed, theterm should not be interpreted as implying any particular order among orbetween various steps herein disclosed unless and except when the orderof individual steps is explicitly described.

Methods, systems, and computer programs products for creatingthree-dimensional video sequences are disclosed. According to an aspect,a method includes receiving a two-dimensional video sequence of a scene.For example, a camera or other image capture device may capture thetwo-dimensional video sequence. The two-dimensional video sequence caninclude a plurality of frames. The method also includes selecting atarget frame from among the plurality of frames. Further, the methodincludes selecting a subset of frames from among the plurality of framesthat are associated with the target frame. The method also includesdetermining a depth of the scene based on the target frame and thesubset of frames. Further, the method includes generating athree-dimensional video frame corresponding to the target frame andbased on the determined depth.

Embodiments of the present subject matter relate to an image capturedevice, such as a camera, that allows a user to capture atwo-dimensional video sequence or use a stored two-dimensional videosequence for generating a three-dimensional video sequence based on thetwo-dimensional video sequence. The functions disclosed herein can beimplemented in hardware, software, and/or firmware that can be executedwithin the image capture device. Example image capture devices include,but are not limited to, a digital still camera, a video camera (orcamcorder), a personal computer, a digital picture frame, a set-top box,an HDTV, a phone, and the like.

According to one or more other embodiments of the present subjectmatter, a method can include use of macro stereo base-techniques (i.e.,utilizing two or more frames captured at a large horizontal displacementof the capture device in the original video sequence) to create a veryaccurate representation of the background and the non-moving objects onthe scene. Further, the method can include the use of micro stereo basetechniques (i.e., utilizing two or more frames captured at a smallhorizontal displacement of the capture device in the original videosequence) to create a very accurate representation of close as well asmoving objects. Such three-dimensional video sequences can be viewed ordisplayed on a suitable stereoscopic display.

The functions and methods described herein can be implemented on animage capture device capable of capturing still images and executingcomputer executable instructions or computer readable program code on aprocessor. The image capture device may be, for example, a digital stillcamera, a video camera (or camcorder), a personal computer, a digitalpicture frame, a set-top box, an HDTV, a phone, or the like. Thefunctions of the image capture device may include methods for selectingvideo segments, creating corresponding views for each image in thesequence, rectifying and registering at least two views, matching thecolor and edges of the views, performing stabilization of the sequence,altering the perceived depth of objects, and display-specifictransformation to create a single, high-quality three-dimensional videosequence.

Image capture devices as disclosed herein may be utilized in varioustypes of shooting modes for capturing a video sequence. In an exampleshooting mode, a video camera may remain static. For example, the videocamera may be mounted on a tripod or otherwise kept stationary duringimage capture. A user may control the video camera to capture images ofan event that is fully or partially contained within a fixed field ofview of the video camera. The video camera may pan left, right, oranother suitable direction for capturing panoramic scenery of the event.

In another example shooting mode, a user may hold the camera during useof the camera to capture images of an event. The event may be fully orpartially contained within a fixed field of view of the camera. Incontrast with the aforementioned example of using a tripod or otherwisekeeping the camera stationary, the camera may not be kept perfectlystatic in this example, because it can be difficult for someone holdingthe camera to keep it still. In this case, there may be some verticaland/or horizontal movement even if anti-shaking techniques areimplemented on the camera.

In yet another example shooting mode, a user may pan the camera or movein various directions to either follow an object that is moving from thecamera's field of view, or to refocus on a different object within thesame field of view. The panning movement may be a parallel and/orrotating movement of the camera.

According to an embodiment, a method may include utilizing macro stereobase techniques to estimate depth when there is a significant movementof the camera. The results from the micro stereo base techniques may befurther used in conjunction with the macro stereo base results toestimate the depth of a scene when the movement of the camera is verysmall.

In an embodiment, a first step in creation of a stereoscopic sequence isto define an initial three-dimensional representation of a scene orenvironment. Once a three-dimensional static space has been defined,moving objects may be identified. A depth of the moving objects can beestimated. Further, the objects may subsequently be placed a suitabledepth in the previously constructed scene. The moving objects may betracked while they are moving into the scene, and the location of theobjects may be adjusted accordingly. Other objects may enter the fieldof view, or other static objects may appear on the scene as a result ofcamera panning. These other objects may also be detected andincorporated at a suitable depth into the scene.

FIG. 1 illustrates a front view of a user 100 holding a camera 102 andmoving the camera 102 for creating an initial panning sequence of ascene (not shown) in accordance with embodiments of the present subjectmatter. A method of creating a three-dimensional video sequence of thescene may include creating a panning sequence of the scene by having auser move the camera in any direction (e.g., generally to the right orleft). In this example, the user 100 is moving the camera 102 to hisright as indicated by the direction arrow 104. Alternatively, the user100 may move the camera 102 to his left or any other suitable direction.As the camera 102 is moved, the camera 102 may capture a two-dimensionalvideo sequence of the scene. The two-dimensional video sequence mayinclude multiple frames of images. After the initial panning sequencehas been completed, the user 100 can keep the camera 102 on the samefield of view to continue capturing a moving object that remains in thesame place (i.e., a person talking), or the user 100 can move the camera102 to re-center to either a different object, or to a new objectentering or leaving the field of view. The captured video sequence,which can be a combination of the previously described shootingtechniques, may be used for creating a three-dimensional video sequencein accordance with embodiments described herein.

FIG. 2 illustrates a diagram depicting a top view of an example panningsequence that may be implemented by use of an image capture device 200in accordance with embodiments of the present subject matter. Referringto FIG. 2, the image capture device 200 is a camera positioned forcapturing images of a background 202 of a scene. Objects 204 and 206 arepositioned in a foreground area of the scene. In these positions, imagesof the objects 204 and 206 are also captured by the device 200. In thisexample, the device 200 is moved or panned between a position generallyreferenced as 208, where the device 200 is depicted by solid lines, anda position generally referenced as 210, where the device 200 is depictedby broken lines. During panning between positions 208 and 210, thedevice 200 captures and stores a two-dimensional video sequence of thescene. When the device 200 is located at position 208, the device 200captures one or more images of a field of view defined by lines 212 and214, which contains segments 216 through 218 of the background 202 ofthe scene, as well as images of the object 204 and part of the object206. When the device 200 is located at position 210, the device 200captures one or more images of a field of view defined by lines 220 and222, which contains segments 224 through 226 of the background 202 ofthe scene, as well an image of the object 206.

Method embodiments described herein can be implemented on an imagecapture device capable of capturing still images and video sequences.The image capture device may also be capable of displayingthree-dimensional images or videos, and executing computer readableprogram code on a processor. Such computer readable program code may bestored on a suitable computer readable storage medium. The image capturedevice may be, for example, a digital still camera, a video camera (orcamcorder), a personal computer, a digital picture frame, a set-top box,an HDTV, a phone, or the like. As an example, FIG. 3 illustrates a blockdiagram of an example image capture device 300 including an image sensor302 and a lens 304 for use in capturing a two-dimensional video sequenceof a scene according to embodiments of the presently disclosed subjectmatter. Further, the image capture device 300 may include a videogenerator 306 configured to create three-dimensional video sequences inaccordance with embodiments of the presently disclosed subject matter.In this example, the image capture device 302 is capable of capturing adigital camera digital video of a scene. The image sensor 302 and lens304 may operate to capture multiple consecutive still digital images ofthe scene. In another example, the image capture device 300 may be avideo camera capable of capturing a video sequence including multiplestill images of a scene. A user of the image capture device 300 mayposition the system in different positions for capturing images ofdifferent perspective views of a scene. The captured images may besuitably stored and processed for creating a three-dimensional videosequence of the scene as described herein. For example, subsequent tocapturing the images of the different perspective views of the scene,the image capture device 300, alone or in combination with a computersuch as computer 306, may use the images for creating athree-dimensional video sequence of the scene and for displaying thethree-dimensional video sequence to the user.

Referring to FIG. 3, the image sensor 302 may include an array of chargecoupled device (CCD) or CMOS sensors. The image sensor 302 may beexposed to a scene through the lens 304 and a respective exposurecontrol mechanism. The video generator 306 may include analog anddigital circuitry such as, but not limited to, a memory 308 for storingcomputer readable program code including computer readable code thatcontrols the image capture device 300, together with at least one CPU310, in accordance with embodiments of the presently disclosed subjectmatter. The CPU 310 executes the computer readable code so as to causethe image capture device 300 to expose the image sensor 302 to a sceneand derive digital images corresponding to the scene. The digital imagesmay be captured and stored in the memory 308. All or a portion of thememory 308 may be removable, so as to facilitate transfer of the digitalimages to other devices such as the computer 306. Further, the imagecapture device 300 may be provided with an input/output (I/O) interface312 so as to facilitate transfer of digital image even if the memory 308is not removable. The image capture device 300 may also include adisplay 314 controllable by the CPU 310 and operable to display thecaptured images in real-time for real-time viewing by a user. Thedisplay 314 may also be controlled for displaying the three-dimensionalvideo sequences created in accordance with embodiments of the presentsubject matter.

The memory 308 and the CPU 310 may be operable together to implement thevideo generator 306 for performing image processing including generationof three-dimensional images in accordance with embodiments of thepresently disclosed subject matter. The video generator 306 may controlthe image sensor 302 and the lens 304 for capturing a two-dimensionalvideo sequence of a scene. The video sequence may include multipleframes. Further, the video generator 306 may further process the imagesand generate a three-dimensional video sequence of the scene asdescribed herein. FIGS. 4A and 4B illustrate a flow chart of an examplemethod for creating a three-dimensional video sequence of a scene usingthe image capture device 300, alone or together with any other suitabledevice, in accordance with embodiments of the present disclosure.Referring to FIGS. 4A and 4B, the method includes using 400 an imagecapture device to capture a two-dimensional video sequence of a scene.For example, the video generator 306 shown in FIG. 3 may control theimage sensor 302 and the lens 304 to capture a two-dimensional videosequence of a scene. The captured video sequence may include images ofthe same or different perspective views of the scene. The CPU 310 maythen implement computer readable code stored in the memory 308 forreceiving and storing the captured video sequence in the memory 308.

The method of FIGS. 4A and 4B includes selecting 402 a target frame fromamong frames of the captured video sequence. For example, the videogenerator 306 may select a target frame T from among multiple framescaptured by the image sensor 302 and the lens 304. The method of FIGS.4A and 4B includes selecting 404 a first subset of frames, N, from amongthe plurality of frames that are associated with the target frame TNrepresents a desired macro stereo base offset of camera position fromthe position related to the target frame T, as measured by camera poseinformation gathered from the sequence or by camera positional sensordata. The method of FIGS. 4A and 4B includes analyzing 406 the firstsubset of frames to identify two images or frames for use in forming astereoscopic pair of frames representative of the desired macro stereobaseline (predetermined spatial displacement of the capture device). Forexample, the video generator 306 may analyze the first subset of framesfor identifying the two images or frames. The predetermined spatialdifference may be such that the two images provide an optimal viewing ofthe scene.

The method of FIGS. 4A and 4B includes extracting 408 depth data ofstatic objects via measurement of their pixel disparities in thestereoscopic pair of frames. For example, the video generator 306 mayextract depth data of static objects in the stereoscopic pair of frames.

The method of FIGS. 4A and 4B includes selecting 410 a second subset offrames, n, from among the plurality of frames, N, that arerepresentative of a substantially smaller stereo baseline separation ofcamera positions than is N. For example, the video generator 306 mayselect a second subset of frames from among multiple captured framesthat are associated with the sequence of frames, N.

The method of FIGS. 4A and 4B includes utilizing 412 the second subsetof frames, n, to measure small disparities of and construct depth datafor moving objects. For example, the video generator 306 may utilize thesecond subset of frames to calculate depth data of moving objects.

The method of FIGS. 4A and 4B includes combining 414 the static andmoving objects based on the depth data. For each image, image warpingtechniques may be used, along with the two-dimensional capture data, togenerate a suitable second view for a stereo pair. For example, thevideo generator 306 may combine the static and moving objects based onthe depth data.

The method of FIGS. 4A and 4B includes generating 416 athree-dimensional video frame corresponding to the target frame based onthe depth data. For example, the video generator 306 may generate athree-dimensional video frame corresponding to the target frame based onthe depth data.

In an example, the method of FIGS. 4A and 4B may include one or moresteps of identifying suitable frames, registration, stabilization, colorcorrection, transformation, and depth adjustment. Further, the methodmay include generating one or more additional frames and frameviewpoints using one of existing raster data and depth information.Further, the method may include using a micro or macro stereo basedtechnique for generating image representations of close and movingobjects of the scene. A display, such as the display 314 or a display ofthe computer 306, may display multiple three-dimensional video frames ina sequence.

FIGS. 5A and 5B illustrate a flow chart of an example method forcreating a three-dimensional video sequence of a scene in accordancewith embodiments of the present subject matter. The example method ofFIGS. 5A and 5B may be implemented by any suitable image capture device,such as the image capture device 300 shown in FIG. 3. For example, thevideo generator 306 shown in FIG. 3 may control components of the imagecapture device 300 to implement the steps of the method of FIGS. 5A and5B. Referring to FIG. 5, the method includes segmenting 500 a videostream into panning sequences based on camera motion. In an example,such segmenting may include identifying instances in which a camera panshorizontally or vertically, instances in which the camera is still,combinations of these, and the like. Each instance of horizontal panning(alone or in part with other camera motion) may represent an instance ofa macro stereo base, while still instances and instances dominated byonly vertical motion may utilize micro stereo base. Sequences can bedivided by utilizing a combination of a scene change detection techniqueand camera pose information. Camera pose can be generated using anycombination of any suitable on-board motion sensor data and suitablecomputer vision techniques that utilize camera calibrations matrices.

Subsequent to step 500, processing for each panning sequence isimplemented as shown in FIGS. 5A and 5B. The processing of each panningsequence begins at step 502. For each target frame, T, a subset offrames, N, representing a desired horizontal displacement of the capturedevice, herein referred to as the macro stereo base can be identified.In the event that a given panning sequence does not displace the cameraenough to reach the targeted stereo baseline, a lesser value may beused, provided it is representative of a macro stereo base, typicallyrepresenting a distance at least as wide as typical human eye separation(6-7 cm), and preferably longer. Measurement of this displacement can beperformed using either analysis of camera pose or using positionalsensor data from the camera. Further, following step 502, processing foreach video frame in the panning sequence is implemented in a processwhich begins at step 504. Once such panning sequence information isgathered, the resulting video segment can be used to create an accuratedepth representation of the scene (step 506). The accuracy of thesemeasurements may be further enhanced via interpolation (upsampling) ofintermediate frame data, if needed. The captured video sequence can bepartitioned in panning sequences that are processed separately (e.g.,beginning at step 502). A new panning sequence can be defined when thereis a significant or predetermined change on the rate at which the videosequence is captured (e.g., camera is panning, remaining static,accelerating/decelerating in one direction, and the like) and/or asignificant change in the contents of the scene (e.g., objects passinginto or out of view). In each panning sequence, the individual framescan be processed to determine the depth of the objects. For each frame,object segmentation can be performed to identify static and movingobjects.

Extraction of the depth information can be a two-step process. Duringthe first step, macro stereo base techniques can be used to identifydepth of static objects. During the second step, micro stereo basetechniques can be used to identify the depth of moving objects. FIG. 6illustrates an example method for depth creation according to anembodiment of the present subject matter. Referring now to FIG. 6, themethod includes segmenting video frames to moving and static objects(step 600). At step 602, the method includes using macro-stereo basetechniques to estimate a depth of the static objects. At step 603, themethod may include estimating the camera movement. At step 604, themethod includes using micro stereo base techniques to estimate depth ofmoving objects. Next, at step 606, the method includes combining theestimated depths from the micro- and macro-stereo base techniques tocreate a depth map of a video frame.

Referring now to FIG. 7, which illustrates a macro stereo base techniquein accordance with embodiments of the present subject matter, the methodincludes identifying ideal stereo based of static objects (step 700),which selects the subset of frames, N, from the current panning sequenceof frames (step 702). Further, the method includes registering theselected frames (step 704) and calculating disparity (step 706). Themethod also includes assigning depth on static objects (step 708).

The initial panning sequence may include of collection of frames thatare taking at different locations. Each symmetric pair of suchcollection can create a stereoscopic representation of static objects orsubjects on the scene at different depths.

Objects that are relatively far from the capture device may require alarger stereo base (defined herein as a macro stereo base) to accuratelyestimate depth of the objects. Referring to FIG. 2 for example, andduring the capture of a video sequence, N frames may have been capturedbetween positions 208 and 210 of the image capture device 200. Each pairof those frames around the center of the 208 and 210 positions can forma stereoscopic pair of different stereo base and therefore a differentdepth perception. For accurate representation of the depth, it may bemore ideal that image capture device positions between positions 208 and210 moves laterally without toeing out. Based on the example of FIG. 2,a pair of those frames may be used to create an initialthree-dimensional representation of the space outlined by lines 214 and220 that includes the segment between 218 and 224, the object 204, and asmall portion of object 206. By continuing this example method, otherareas can be covered to generate a stereoscopic view of the area ofinterest. For example, FIG. 7 illustrates a flow chart of an exemplarymethod for depth creation using a macro-stereo based technique accordingto an embodiment of the present subject matter. For such a macro stereobase, measurement of the depth of individual objects may be accomplishedby analyzing image pairs at an interval N, such that the interval isrepresentative of a target stereo baseline, or by maintaining depthvalues of those same objects in the case that the capture device is nolonger being displaced (between panning sequences). Image pairs areregistered and processed by use of suitable techniques. Thedisplacement, or disparity, of objects in the image pairs may then bemeasured and translated into depth under the general stereo equationh=Baseline*FocalLength/disparity, where Baseline is the distance ofcamera movement in the interval of N frames and FocalLength is the focallength of the camera lens. This macro-stereo base technique may allowidentification of disparities of, and assign depth to, static objects inthe scene. In an embodiment, a dense disparity map may be generated,producing a disparity result (and hence, a depth estimate) for eachpixel in the frame. In another embodiment, in part because thecalculation of a dense disparity map may be difficult, techniques usingfeature detection and tracking can be used to generate only a sparsedisparity map for a subset of pixels. Additionally, moving objects maybe identified and ignored during this process.

Returning to FIGS. 5A and 5B, the method may include a small stereobaseline (micro-stereo) analysis to calculate depth of moving objectsand near objects for which disparity techniques are weak at largerstereo baselines (e.g., step 504). Prior to micro-stereo analysis, themovement of the camera may be computed to increase the accuracy of themicro-stereo analysis since it relies on small camera movements (step603 of FIG. 6).

For each frame in the sequence, camera movement can be recorded via asuitable module such as, but not limited to, a gyroscopic sensorresiding in the camera. Another method of identifying the positioning ofthe camera includes analyzing two frames, identifying key points in bothframes, rectifying and extracting the fundamental matrix relationship,and combining with camera parameters (focal length and the like) togenerate the projective camera matrices for each position.

Analysis of the motion vectors of each pixel within the context ofobject segmentation can provide detailed information of the depth of ascene. Analysis of motion vectors of static objects, due to cameramovement, can detect the movement of an image capture device. If anobject is at the same depth, motion vectors of that object canaccurately detect the lateral (horizontal and/or vertical movement ofthe camera). If parts of an object reside in various depths, the motionvectors of the individual pixels can accurately detect the rotationalmovement of the image capture device.

In an example, FIG. 8 illustrates a flow chart of an example method fordepth creation using a micro-stereo based technique according to anembodiment of the present subject matter. This may be accomplished byanalyzing image pairs at an interval M, such that M<<N. The value of nmay be the equivalent of the motion incurred in panning for 1/60 to 1/15of a second. Each pair of images may again be registered and processedwith suitable techniques. Disparities of objects in the scene that mayhave been previously recognized as moving or non-static can be measuredusing the same general techniques and again, depths can be assigned.Since the camera movement during the interval of n frames may berelatively small, this represents a micro-stereo based approach thatallows evaluation of object depth for non-static objects with a highdegree of confidence, such as may not be possible by use of amacro-stereo technique, due to temporal latency. By also compensatingfor camera movement as described at step 303 of FIG. 6 can furtherimprove the quality of micro-stereo analysis.

Referring now to FIG. 8, the method includes analyzing video framesbetween 1 and n, where n is less than N (step 800). The method alsoincludes selecting two video frames of distance n to match stereo base(step 802). Further, the method includes registering the selected frames(step 804). The method also includes calculating disparity (step 806)and assigning depth of close objects and objects identified as moving(step 808).

FIG. 9 illustrates a diagram showing depth calculation using amicro-based technique in accordance with an embodiment of the presentsubject matter. Referring to FIG. 9, a scene including a background areagenerally designated 900, a moving object 902 (person), a static object904 (window), and another static object 906 (base) is captured using acamera. During the initial panning sequence, two frames taken at times tand t+H (indicated by frame 908 and 910, respectively) have been used toconstruct the macro stereo base depth representation of the scene. Theobjects captured in the first frame 908 are designated by “a” (e.g.,objects 902 a, 904 a, and 906 a), and the object captured in the secondframe 910 are designated by “b” (e.g., objects 902 b, 904 b, and 906 b).The disparity between the moving object 902 a, 902 b in the frames 908and 910 is indicated by Dm, the disparity between the static object 904a, 904 b in the frames 908 and 910, which is further in the back in thebackground 900 is indicated by Dw, and the disparity between the staticobject 904 a, 904 b is indicated by Dv. In this particular example,Dm<Dv<Dw.

During a panning sequence, if the object 902 a, 902 b is moving, asmaller stereo base technique can be used for the depth analysis. Underthe micro stereo basis analysis shown on the right side of FIG. 9, twoframes 912 and 914 that are close in time are examined (t and t+h, whereh is much smaller than H). In such case, the movement of the object 902a, 902 b is very small and conceptually static. In an example, motionestimation techniques may be used to compensate for the motion of themoving object. By disabling any motion stabilization capabilities insidethe camera, there can be some natural micro-stereo based because ofhandshaking and this can result in movement of the camera from positions916 to 918 of the camera 901. The process may be repeated multiple timeswith different values of n (where n<h) until a disparity value for eachobject identified as moving is available that discounts most or all ofthe motion of the object. It is noted that the disparity values may onlybe important in relative relation to the image, and need not representspecifically accurate disparity for each object.

Now referring back to FIGS. 5A and 5B at step 506, at the point ofhaving data from step 508 in FIGS. 5A and 5B and step 808 in FIG. 8,macro and micro stereo base information can be combined. Disparityinformation from the macro stereo base calculations is assigned to theobjects identified as “non-moving” or distant, relative to a givencamera separation represented by a set of frames, N. Disparityinformation must then be added for moving and close objects using theresult from the micro analysis. For close objects, this may be performedsimply by noting the disparity result(s) from step 808 and the ratiobetween the specific value of n used to assign the value and N. Thedisparity for these pixels can become the value from step 808 multipliedby the factor N/n.

In FIG. 9, a Dm′ disparity of the object 902 a, 902 b can be comparedwith the Dw′ disparity of the known static object 904 a, 904 b and theDw disparity of the static object in the macro stereo base case.

For moving objects, the relative relationships of the same pixels in thedifferent analyses may be relied upon. From the micro calculation (e.g.,step 808), a value of n can be identified for which a disparity can beassigned to the pixels in question. This disparity can have a relationto other pixels in the scene, and specifically, to pixels identified andplaced in a previous video segment, or to pixels identified and placedat a particular static disparity in step 708. For pixels that correspondto those placed in an earlier segment, the depth assigned in thatsegment can be extrapolated to the current segment, while recognizing apossibility of approach toward or retreat from the camera.

For pixels not previously assigned, the ratios of the depth anddisplarity of known objects can be used to place unknown object. Withoutloss of generality, a pixel classified as moving at coordinate (j, k) inthe micro analysis can be assumed to have an assigned disparity d₁ forthe selected frame M, and a pixel at coordinate (m, n) that has a staticdisparity D₂ for separation N in the macro analysis and a value of d₂for the selected frame n in the micro analysis. The disparity assignedthis pixel in the final combined map can subsequently be calculated asD₁==D₂*d₁/d₂. Repetition of this process for multiple small intervals ofn across a video segment can also be used to place approaching andretreating objects in a scene.

In one case, a suitable n may not be found, and other micro-basedmethods can be employed to calculate the depth of moving objects ormovement of the camera. Utilization of those techniques may apply tovalues of n ranging from one to two. In other words, successive framesmay be used to measure movement of various subjects.

At any given time, each of the above shooting modes can create movementof subjects that can be classified as of one of the followingcategories: absolute movement of objects on the scene; and movement ofan image capture device that results into a movement of static objectson the scene (global motion for all static pixels on the scene) and arelative motion for the moving objects on the scene (global minusabsolute motion). In the category of absolute movements of objects onthe scene, this can be a three dimensional movement where objects travelin x, y, and z (i.e., depth) dimensions. In the category of movement ofan image capture device, the movement of the camera can be threedimensional. Further, in this category, besides movement on thehorizontal and vertical coordinates, movement on the depth (i.e.,z-plane) can be caused by the actual camera moving closer or furtheraway from the object or by utilizing the zoom capability of the imagecapture device. Further, movement of the image capture device can beeither lateral or rotational.

Because the movement of an image capture device can be rotational,global motion many not be identical for all the static pixels on thescene. Pixels closer to the image capture device may have larger motionvectors and pixels far away from the image capture device may havesmaller magnitude motion vectors compared to the closer objects. Inaddition, the movement of the image capture device cannot be constantand is expected that it can change over time. Therefore, at any giventime each pixel (P) located in ‘i’ horizontal, a T vertical, and a ‘k’depth coordinates can potentially have a different static motion vectorcaused by image capture device movement mvsx (t, i, j, k), mvsy (t, i,j, k) and mvsz (t, i, j, k) for horizontal, vertical, and z-planemovement, as well as absolute motion vector caused by the movement ofthe object (mvmx (t, i, j, k), mvmy (t, i, j, k) and mvmz (t, j, k) forhorizontal, vertical, and depth movements). Knowing the location of anystatic pixel (Ps) in a frame (t), it location in time (t+1) can be foundusing the following equation:

Ps(t+1,i,j,k)=Ps(t,i+mvsx(t,i,j,k),j+mvsy(t,i,j,k),k+mvsz(t,i,j,k)+mvmz(t,i,j,k)  (Equation1)

Any moving pixel (Pm) in a frame (t+1) can be found in frame (t) usingthe following equation:

Pm(t+1,i,j,k)=Pm(t,i+mvsx(t,i,j,k)+mvmx(t,i,j,k),j+mvsy(t,i,j,k)+mvmy(t,i,j,k),k+mvsz(t,i,j,k)+mvmz(t,i,j,k)  (Equation2)

The static motion vectors (mvsx, mvsy, mvxz) can be calculated byanalyzing the captured frames, identifying highly correlated staticpoints on the images (Ps rectifying and registering the frames, and thencalculating the inverse registration transform that can result from themovement of those points into the three dimensional space (Equation 1).

Once the static motion vectors have been calculated, the absolute motionvectors (mvmx, mvmy, mvmz) can be calculated by performing the sameoperation for the Pm points using Equation (2).

By subsequently subtracting the static motion vectors from the combinedmotion vectors, the absolute motion vectors for moving objects in thescene can be obtained. With the present analysis, the camera movement,the static background, and the moving objects can be fully defined intothe three-dimensional space.

Non-flying objects can have an anchor point which is the ground for mostpart of static object staying at the ground. The initial position ofsuch objects is based on their anchor points to the ground. The ground,or in general any static object, can be placed on the three-dimensionalspace by using the following technique:

-   -   First, the image can be analyzed to detect all static objects by        looking at the disparity between two images comprising a stereo        pair during macro stereo-base analysis;    -   The image can be segmented into static objects in objects;    -   In each object, key features can be identified that are also key        features on a corresponding stereo pair during macro analysis;        and    -   The disparity of the key points provides a surface on the three        dimensional space which is the static object.

For flying objects as well as non-flying in certain instances, theirtrajectory path can be identified and determine their path across otherobjects. If their path hides static objects, it can be implied that theyare in front of them. If their path places them behind objects, theirpath is behind objects.

In case there is no much movement on the camera (both laterally androtationally), the background can remain constant, and its depth hasalready been estimated. For moving objects and once an initial estimateis obtained, utilize motion compensation techniques can be used toestimate the speed of the object in the horizontal and verticaldimensions, and the rate of scaling methods to estimate its speed on thez-plane. Based on those estimates, the proper depth representation canbe created.

Returning to FIGS. 5A and 5B, the method further includes the dynamicadjustment of the background or static objects when user pans the cameraleft or right to maintain the high depth accuracy obtained using thelarge stereo base (step 508). This adjustment may be performed bylooking at the depth map gradient of the different surfaces thatcomprise the background scene and interpolating based on the amount ofmovement.

The method of FIGS. 5A and 5B further includes the dynamic adjustment ofmoving objects (step 510). For example, the size of moving objects maybe compared with the size on the previous frames and if the size getslarger it is appropriately adjusted on the depth map by taking inconsideration the size of the increase and the time frame at which thisincrease took place. Similarly, if the object size gets smaller, it canbe moved back on the depth plane with the amount that is appropriatebased on the size of the decrease and the tie for this decrease tookplace.

Once the lateral and rotational motion vectors of the camera have beenestimated, the absolute motion vectors of the moving objects can also beestimated. The rate of increase of the size of a moving objectdetermines its motion towards the camera (closer in depth) and the rateof decrease of the size determines the motion away from the camera(farther in depth).

According to an embodiment, rate of scaling is a technique where keyfeatures of an object are measured between successive frames. Uponapproach or retreat of an object from the camera (or camera from theobject), object motion vectors may be indicative of the movement. FIG.10 illustrates diagrams of a micro stereo based technique usingframe-by-frame analysis in accordance with an embodiment of the presentsubject matter. For an object 1000 a at its position at time t, which isapproaching the image capture device, its location at time t+1 may becloser to the image capture device than when the object 1000 b is at itslocation at time t+1, and therefore it will appear larger. For an object1002 a at its position at time t and is retreating from the imagecapture device, its location at time t+1 will be further away when theobject 1002 b is at its location at time t+1, and therefore it willappear smaller. For motion directly in line with the image capturedevice center, directionality of the mode measurements of pixel movementon the left, right, top, and bottom centers is indicative of the object(or camera) motion toward or away from the camera. Approaching motionmay have a negative motion component on the left and top of the object,and positive on the right and bottom, such that the magnitude betweenleft and right side vectors and between top and bottom vectors canincrease. Retreating motion will have the opposite, such that the samemagnitudes can decrease. Without loss of generality, any component ofthe object (or camera) motion that is away from the center line from theobject to the camera can result in a translational component that maychange the scale of the vectors, but will still maintain the magnitudechange relationship. The rate of change these magnitudes, combined witha depth estimation, can be indicative of the velocity of this movement.

The combined motion vectors (MV) of moving objects, defined as objectscomprising of pixels that do not follow the global motion movement, canbe calculated using “rate of scaling” techniques to calculate motionvectors for movement in depth and traditional motion compensationmethods to calculate movement in x and y coordinates. Enlargement of themeasurements, during rate of scaling calculations, indicate that objectsmove closer to the camera, whereas smaller measurement indicate thatobjects move away from the camera. The rate of change also determinesthe motion vectors.

The method of FIGS. 5A and 5B further includes identification ofocclusion zones resulting from the movement of objects in the scene(step 512). Once the occlusion area is calculated, a search canperformed in adjacent frames to identify an accurate representation ofsuch areas. If such areas do not exist, then occlusion zones can becalculated based on suitable techniques.

The method of FIGS. 5A and 5B further includes detection and placementof new objects in the depth plane (step 514). New objects can appear onthe scene by either panning the camera left or right, or the objectsentering the scene because of their movement. Once a new object isdetected, it can be determined whether the object is a static or movingobject by, for example, analyzing the motion vectors of the object andcomparing the vectors with the motion vectors of the still backgroundobjects (step 516). If the objects are static, macro based techniquescan be used to determine their depth and frames with large stereo basecan be used to create their three-dimensional representation (step 518).If the objects are moving objects, micro based techniques can be used todetermine their depth and frames with small stereo base can be used tocreate their three-dimensional representation after they have properlyadjusted for motion (step 520). The recognition of new “objects”entering a given scene may also trigger the creation of a new panningsequence.

The method of FIGS. 5A and 5B includes processing a next frame in thesequence, if there is another frame (step 522). If there is anotherframe, the method proceeds to step 504. Otherwise, the method includescombining the current panning sequence with the one before by equalizingdepth and other 3D parameters (step 524), as well as stabilizing forhand-shaking (step 525), and then processing a next panning sequence, ifthere is another (step 526). The method may then proceed to step 502 foranother panning sequence.

The method of FIGS. 5A and 5B further includes combining the resultsfrom the macro- and micro-stereo base calculations to create athree-dimensional model for each frame in the panning sequence of agiven scene, with the recognition that a given video sequence mayinclude of multiple panning sequences, with depth equalization betweenthem based on corresponding features (step 524). Each raster pixel for aframe can have an assign (x,y,z) triplet, and can be transformed viaperspective projection to a triplet (x′,y′,z′) using a targeted angularrotation about the Y axis or similar view synthesis technique. Thescreen plane depth may initially be chosen or selected to create theaxes for this rotation, and this may be done by selecting a depth fromthe range represented such that objects in front of the modeled planeare no further than X % of the planes distance closer to the viewer,with a typical value for X being 25%. The target angular rotation may beselected using an approximate viewing location and screen size for thefinal video, and is chosen to create an ideal stereo base representationof the entire scene and comfortable depth range for the viewer. In anembodiment, this depth range is viewer adjustable from the defaultsettings previously calculated using the aforementioned parameters.Following this transform, multiple pixels are projected back to 2D spacefor viewing, with the recognition that some pixels may occupy the samespace (meaning that the closer is viewed and the further is hidden), orthat disparity/depth estimates may not exist for some pixels (in asparse disparity map embodiment), such that some raster locations maynot have assigned pixel values. In the latter case, pixel fill andinterpolation methodologies, utilizing data available from previous orsubsequent frames in the panning sequence may be performed.

The creation and presentation, such as display, of three-dimensionalvideos of a scene in accordance with embodiments of the present subjectmatter may be implemented by a single device or combination of devices.In one or more embodiments of the present subject matter, images may becaptured by a camera such as, but not limited to, a digital camera. Thecamera may be connected to a personal computer for communication of thecaptured images to the personal computer. The personal computer may thengenerate one or more three-dimensional videos in accordance withembodiments of the present subject matter. After generation of thethree-dimensional images, the personal computer may communicate thethree-dimensional videos to the camera for display on a suitablethree-dimensional display. The camera may include a suitablethree-dimensional display. Also, the camera may be in suitableelectronic communication with a high-definition television for displayof the three-dimensional videos on the television. The communication ofthe three-dimensional videos may be, for example, via an HDMIconnection.

In one or more other embodiments of the present subject matter,three-dimensional videos may be generated by a camera and displayed by aseparate suitable display. For example, the camera may captureconventional two-dimensional images and then use the captured images togenerate three-dimensional videos. The camera may be in suitableelectronic communication with a high-definition television for displayof the three-dimensional videos on the television. The communication ofthe three-dimensional videos may be, for example, via an HDMIconnection.

The subject matter disclosed herein may be implemented by a suitableelectronic device having one or more processors and memory, such as adigital still camera, a video camera, a mobile phone, a smart phone,phone, or the like. In order to provide additional context for variousaspects of the disclosed subject matter, FIG. 11 and the followingdiscussion are intended to provide a brief, general description ofcomponents of a suitable electronic device 1100 in which various aspectsof the disclosed subject matter may be implemented. While the presentsubject matter is described in the general context ofcomputer-executable instructions, such as program modules, executed byone or more computers or other devices, those skilled in the art willrecognize that the disclosed subject matter can also be implemented incombination with other program modules and/or as a combination ofhardware and software.

Generally, however, program modules include routines, programs, objects,components, data structures, etc. that perform particular tasks orimplement particular data types. The operating environment 1100 is onlyone example of a suitable operating environment and is not intended tosuggest any limitation as to the scope of use or functionality of thesubject matter disclosed herein. Other well-known computer systems,environments, and/or configurations that may be suitable for use withthe subject matter include but are not limited to, personal computers,hand-held or laptop devices, multiprocessor systems,microprocessor-based systems, programmable consumer electronics, networkPCs, minicomputers, mainframe computers, distributed computingenvironments that include the above systems or devices, and the like.

With reference to FIG. 6, an exemplary environment 1100 for implementingvarious aspects of the present subject matter disclosed herein includesa computer 1102. The computer 1102 includes a processing unit 1104, asystem memory 1106, and a system bus 1108. The system bus 1108 couplessystem components including, but not limited to, the system memory 1106to the processing unit 1104. The processing unit 1104 can be any ofvarious available processors. Dual microprocessors and othermultiprocessor architectures also can be employed as the processing unit1104.

The system bus 1108 can be any of several types of bus structure(s)including the memory bus or memory controller, a peripheral bus orexternal bus, and/or a local bus using any variety of available busarchitectures including, but not limited to, 11-bit bus, IndustrialStandard Architecture (ISA), Micro-Channel Architecture (MCA), ExtendedISA (EISA), Intelligent Drive Electronics (IDE), VESA Local Bus (VLB),Peripheral Component Interconnect (PCI), Universal Serial Bus (USB),Advanced Graphics Port (AGP), Personal Computer Memory CardInternational Association bus (PCMCIA), and Small Computer SystemsInterface (SCSI).

The system memory 1106 includes volatile memory 1110 and nonvolatilememory 1112. The basic input/output system (BIOS), containing the basicroutines to transfer information between elements within the computer1102, such as during start-up, is stored in nonvolatile memory 1112. Byway of illustration, and not limitation, nonvolatile memory 1112 caninclude read only memory (ROM), programmable ROM (PROM), electricallyprogrammable ROM (EPROM), electrically erasable ROM (EEPROM), or flashmemory. Volatile memory 1110 includes random access memory (RAM), whichacts as external cache memory. By way of illustration and notlimitation, RAM is available in many forms such as synchronous RAM(SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rateSDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), anddirect Rambus RAM (DRRAM).

Computer 602 also includes removable/nonremovable, volatile/nonvolatilecomputer storage media. FIG. 6 illustrates, for example, disk storage1114. Disk storage 1114 includes, but is not limited to, devices like amagnetic disk drive, floppy disk drive, tape drive, Jaz drive, Zipdrive, LS-100 drive, flash memory card, or memory stick. In addition,disk storage 1024 can include storage media separately or in combinationwith other storage media including, but not limited to, an optical diskdrive such as a compact disk ROM device (CD-ROM), CD recordable drive(CD-R Drive), CD rewritable drive (CD-RW Drive) or a digital versatiledisk ROM drive (DVD-ROM). To facilitate connection of the disk storagedevices 1114 to the system bus 1108, a removable or nonremovableinterface is typically used such as interface 1116.

It is to be appreciated that FIG. 11 describes software that acts as anintermediary between users and the basic computer resources described insuitable operating environment 1100. Such software includes an operatingsystem 1118. Operating system 1118, which can be stored on disk storage1114, acts to control and allocate resources of the computer system1102. System applications 1120 take advantage of the management ofresources by operating system 1118 through program modules 1122 andprogram data 1124 stored either in system memory 1106 or on disk storage1114. It is to be appreciated that the subject matter disclosed hereincan be implemented with various operating systems or combinations ofoperating systems.

A user enters commands or information into the computer 1102 throughinput device(s) 1126. Input devices 1126 include, but are not limitedto, a pointing device such as a mouse, trackball, stylus, touch pad,keyboard, microphone, joystick, game pad, satellite dish, scanner, TVtuner card, digital camera, digital video camera, web camera, and thelike. These and other input devices connect to the processing unit 1104through the system bus 1108 via interface port(s) 1128. Interfaceport(s) 1128 include, for example, a serial port, a parallel port, agame port, and a universal serial bus (USB). Output device(s) 1130 usesome of the same type of ports as input device(s) 1126. Thus, forexample, a USB port may be used to provide input to computer 1102 and tooutput information from computer 1102 to an output device 1130. Outputadapter 1132 is provided to illustrate that there are some outputdevices 1130 like monitors, speakers, and printers among other outputdevices 1130 that require special adapters. The output adapters 1132include, by way of illustration and not limitation, video and soundcards that provide a means of connection between the output device 1130and the system bus 1108. It should be noted that other devices and/orsystems of devices provide both input and output capabilities such asremote computer(s) 1134.

Computer 1102 can operate in a networked environment using logicalconnections to one or more remote computers, such as remote computer(s)1134. The remote computer(s) 1134 can be a personal computer, a server,a router, a network PC, a workstation, a microprocessor based appliance,a peer device or other common network node and the like, and typicallyincludes many or all of the elements described relative to computer1102. For purposes of brevity, only a memory storage device 1136 isillustrated with remote computer(s) 1134. Remote computer(s) 1134 islogically connected to computer 1102 through a network interface 1138and then physically connected via communication connection 1140. Networkinterface 1138 encompasses communication networks such as local-areanetworks (LAN) and wide-area networks (WAN). LAN technologies includeFiber Distributed Data Interface (FDDI), Copper Distributed DataInterface (CDDI), Ethernet/IEEE 1102.3, Token Ring/IEEE 1102.5 and thelike. WAN technologies include, but are not limited to, point-to-pointlinks, circuit switching networks like Integrated Services DigitalNetworks (ISDN) and variations thereon, packet switching networks, andDigital Subscriber Lines (DSL).

Communication connection(s) 1140 refers to the hardware/softwareemployed to connect the network interface 1138 to the bus 1108. Whilecommunication connection 1140 is shown for illustrative clarity insidecomputer 1102, it can also be external to computer 1102. Thehardware/software necessary for connection to the network interface 1138includes, for exemplary purposes only, internal and externaltechnologies such as, modems including regular telephone grade modems,cable modems and DSL modems, ISDN adapters, and Ethernet cards.

The various techniques described herein may be implemented with hardwareor software or, where appropriate, with a combination of both. Thus, themethods and apparatus of the disclosed embodiments, or certain aspectsor portions thereof, may take the form of program code (i.e.,instructions) embodied in tangible media, such as floppy diskettes,CD-ROMs, hard drives, or any other machine-readable storage medium,wherein, when the program code is loaded into and executed by a machine,such as a computer, the machine becomes an apparatus for practicing thesubject matter. In the case of program code execution on programmablecomputers, the computer will generally include a processor, a storagemedium readable by the processor (including volatile and non-volatilememory and/or storage elements), at least one input device and at leastone output device. One or more programs are preferably implemented in ahigh level procedural or object oriented programming language tocommunicate with a computer system. However, the program(s) can beimplemented in assembly or machine language, if desired. In any case,the language may be a compiled or interpreted language, and combinedwith hardware implementations.

The described methods and apparatus may also be embodied in the form ofprogram code that is transmitted over some transmission medium, such asover electrical wiring or cabling, through fiber optics, or via anyother form of transmission, wherein, when the program code is receivedand loaded into and executed by a machine, such as an EPROM, a gatearray, a programmable logic device (PLD), a client computer, a videorecorder or the like, the machine becomes an apparatus for practicingthe subject matter. When implemented on a general-purpose processor, theprogram code combines with the processor to provide a unique apparatusthat operates to perform the processing of the present subject matter.

While the embodiments have been described in connection with thepreferred embodiments of the various figures, it is to be understoodthat other similar embodiments may be used or modifications andadditions may be made to the described embodiment for performing thesame function without deviating therefrom. Therefore, the disclosedembodiments should not be limited to any single embodiment, but rathershould be construed in breadth and scope in accordance with the appendedclaims.

What is claimed:
 1. A method for creating a three-dimensional videosequence of a scene, the method comprising: using at least one processorand memory for: receiving a two-dimensional video sequence of a scene,the two-dimensional video sequence including a plurality of frames;selecting a target frame from among the plurality of frames; selecting afirst subset of frames, N, from among the plurality of frames that areassociated with the target frame, representative of a targeted stereodisplacement of the capture device from the target frame position;analyzing the first subset of frames to identify two images for use informing a stereoscopic pair of frames with a predetermined spatialdifference; identifying and classifying static and moving objectsbetween the stereoscopic pair of frames; extracting depth data of staticobjects in the stereoscopic pair of frames; selecting a second subset offrames, n, from among the plurality of frames that are associated withthe target frame, representative of a stereo displacement that issubstantially smaller than the displacement represented by N; utilizingthe second subset of frames to calculate depth of moving objects;combining the depth values of static and moving objects based on anabsolute depth of static objects and the relative depth of movingobjects; and generating a three-dimensional video frame consisting ofthe target frame and a depth-generated view corresponding to the targetframe based on the depth data.
 2. The method of claim 1, furthercomprising one or more steps of identifying suitable frames,registration, stabilization, color correction, transformation, and depthadjustment.
 3. The method of claim 1, further comprising generating oneor more additional frames and frame viewpoints using one of existingraster data and depth information.
 4. The method of claim 1, furthercomprising using a micro stereo base technique for generating imagerepresentations of close and moving objects of the scene.
 5. The methodof claim 1, further comprising using a macro stereo base technique forgenerating image representations of a background and non-moving objectsof the scene.
 6. The method of claim 1, further comprising using animage capture device for capturing the two-dimensional video sequence.7. The method of claim 1, further comprising using the steps of claim 1for generating a plurality of three-dimensional video frames.
 8. Themethod of claim 1, further comprising dividing the capturedtwo-dimensional video sequence into segments divided by utilizing one ofa scene change detection and camera pose information.
 9. The method ofclaim 8, further comprising creating a stereoscopic video sequence fromeach segment.
 10. The method of claim 9, further comprising equalizingthe depths and other three-dimensional parameters and combining theindividual stereoscopic segments to form a single three-dimensionalvideo stream.
 11. The method of claim 1, further comprising analyzingcaptured frames to measure object displacements between two or moreframes via motion vectors and identifying and classifying static andmoving objects within the current target frame.
 12. The method of claim1, further comprising: analyzing captured frames; and identifying aposition of key static objects to estimate the position of the camera.13. The method of claim 1, further comprising measuring the motionvectors of the moving objects and estimating their relative position onthe x, y space.
 14. The method of claim 1, further comprising: measuringsizes of moving objects; and estimating their relative position on thez-space based on the rate of increase or decrease of their sizes. 15.The method of claim 1, further comprising estimating the depth of movingobjects by identifying the depth of key points of moving objectsrelative to the depth of neighboring static objects with known depthsand extrapolating based on temporal or structural relationships.
 16. Themethod of claim 1, further comprising estimating the depth of movingobjects by determining whether they are moving behind or in front ofstatic objects and utilizing the depth of static objects to estimate thedepth of a moving object based on the moving object's trajectory intime.
 17. The method of claim 1, further comprising: segmenting thestatic portions of the scene into static objects with known positions inthree-dimensional space and utilizing prediction techniques to estimatethe depth of new information entering into the captured video sequenceassociated with the same or related static objects.
 18. The method ofclaim 1, further comprising: measuring sizes of static objects; andestimating changes in camera focal length and adjusting the depth of thescene based on the relative zoom factor of the camera based on theirrelative rate of increase or decrease over time.
 19. A system forcreating a three-dimensional video sequence of a scene, the systemcomprising: a memory having stored therein computer program code; acomputer processor that executes the computer program code; a videogenerator configured to: receive a two-dimensional video sequence of ascene, the two-dimensional video sequence including a plurality offrames; select a target frame from among the plurality of frames; selecta first subset of frames, N, from among the plurality of frames that areassociated with the target frame, representative of a targeted stereodisplacement of the capture device from the target frame position;analyze the first subset of frames to identify two images for use informing a stereoscopic pair of frames with a predetermined spatialdifference; extract depth data of static objects in the stereoscopicpair of frames; select a second subset of frames, n, from among theplurality of frames that are associated with the target frame,representative of a stereo displacement that is substantially smallerthan the displacement represented by N; utilize the second subset offrames to calculate depth of moving objects; combine the depth values ofstatic and moving objects based on the absolute depth of static objectsand the relative depth of moving objects; and generate athree-dimensional video frame consisting of the target frame and adepth-generated view corresponding to the target frame based on thedepth data.
 20. The system of claim 19, wherein the video generator isconfigured to implement one or more functions of identifying suitableframes, registration, stabilization, color correction, transformation,and depth adjustment.